Text Categorization Using Neural Networks Initialized with Decision Trees

نویسندگان

  • Nerijus Remeikis
  • Ignas Skucas
  • Vida Melninkaite
چکیده

Text categorization – the assignment of natural language documents to one or more predefined categories based on their semantic content – is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. Decision tree from root node until a final leave is used for initialization of each single unit. Growing decision trees with increasingly larger amounts of training data will result in larger decision tree sizes. As a result, the neural networks constructed from these decision trees are often larger and more complex than necessary. Appropriate choice of certainty factor is able to produce trees that are essentially constant in size in the face of increasingly larger training sets. Experimental results support the conclusion that error based pruning can be used to produce appropriately sized trees, which are directly mapped to optimal neural network architecture with good accuracy. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

E-mail Classification by Decision Forests

We investigate the use of decision forests for automated e-mail filing into folders and junk e-mail filtering. The experiments show that decision forests offer the following advantages: (i) ability to deal with the large dimensionality of feature vectors in text categorization, (ii) improved accuracy of the ensemble over the single decision trees and favourable comparison with a number of other...

متن کامل

Web Documents Categorization Using Neural Networks

This paper shows, through experimental results, that artificial neural networks are good classifiers for the text categorization task. The paper compares the results of experiments on text categorization using Multilayer Perceptron, Self-organizing Maps, C4.5 decision tree and PART decision rules. The experiments were carried out with K1 collection of web documents.

متن کامل

Deep Jointly-Informed Neural Networks

In this work a novel, automated process for determining an appropriate deep neural network architecture and weight initialization based on decision trees is presented. The method maps a collection of decision trees trained on the data into a collection of initialized neural networks, with the structure of the network determined by the structure of the tree. These models, referred to as “deep jo...

متن کامل

Comparison of Classification Methods by Using the Reuters Database

In this paper we have focused on a frequently used data mining technique: document classification. This technique’s goal is to categorize elements. This categorization is based on a sample set, which was filled with special manually categorized elements. During the preparation methods supervised learning could be used, such as back propagation and Hopfield neural networks, inductive learning me...

متن کامل

Entrepreneurship policy and innovative indicators of industrial companies: Evaluation by MCDM and ANN Methods

The present paper presented a methodology for prioritizing the innovative and entrepreneurial indicators using Multi Criteria Decision Making (MCDM) and Artificial Neural Networks (ANNs), taking into account three individual, organizational and cultural dimensions simultaneously in decision making procedure. This methodology has two main advantages: first, the speed of operation in the accounti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Informatica, Lith. Acad. Sci.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2004